Move container build infrastructure to Ansible #1009

ikerexxe · 2024-05-29T14:34:03Z

Using dockerfiles to build the containers in the CI prevented us from getting the logs when any of the build steps failed. Using Ansible will help us tackle this problem while still maintaining a certain independence from Github Actions.

Using a dockerfile to build, install and test the code can be problematic as we can't capture the log files to check what failed in case of failure. This PR converts the fedora dockerfile to Ansible, an open source IT automation tool. The tool can be used on the developers and the CI system to check whether a piece of code can be built, installed and tested. This is the first patch in a series, where I will convert the existing PR workflows to use Ansible instead of dockerfiles. Signed-off-by: Iker Pedrosa <[email protected]>

share/container-build.sh

alejandro-colomar · 2024-05-29T19:01:01Z

These ansible(1) playbooks still build Docker images, so I guess under the hood they still do the same thing that a Dockerfile and docker(1) could do. Can't we achieve the same with just docker(1) and a Dockerfile?

That is, can you explain how is ansible(1) necessary (or better than docker(1)) to achieve this?

ikerexxe · 2024-05-30T08:52:25Z

Yes, they do the same, and if I could I would have stick with the docker and dockerfile architecture, but it isn't possible. At least if we want to obtain the logs without having to do some nasty things, like 1 and 2 (which in reality doesn't work because when something fails the container build never arrives to this point).

In addition, when dockerfile build fails the container is destroyed, which prevents us from accessing it to see what has happened. Which brings us to the previous point, getting the logs.

Using Ansible allows us to maintain a similar approach to what we had until now: build this project for different distributions, run the workflow locally, maintain some independence from Github actions, etc. On top of that, we'll be able to obtain all the logs, and if we are running the workflow locally and anything fails we'll be able to access the containers to debug what's happening.

alejandro-colomar · 2024-05-30T09:44:21Z

Yes, they do the same, and if I could I would have stick with the docker and dockerfile architecture, but it isn't possible. At least if we want to obtain the logs without having to do some nasty things, like 1 and 2 (which in reality doesn't work because when something fails the container build never arrives to this point).

This does actually work (I could see the logs). I've seen it fail once (not exactly this; it was the same trick, but in a github actions), and documented it here:
#956 (comment)

And I wrote some small Dockerfile to test this with a docker build:

$ cat Dockerfile 
FROM debian

RUN bash -c "trap 'cat </etc/os-release >&2' ERR; false;"

$ sudo docker build .
Sending build context to Docker daemon  23.55kB
Step 1/2 : FROM debian
 ---> 2a033a8c6371
Step 2/2 : RUN bash -c "trap 'cat </etc/os-release >&2' ERR; false;"
 ---> Running in 3c6f20ae7c6b
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
The command '/bin/sh -c bash -c "trap 'cat </etc/os-release >&2' ERR; false;"' returned a non-zero code: 1

Or do you have any case where it failed and it didn't work?

Edit: Now I realize you probably meant the COPY trick is the one that doesn't work). That's probably true. But stderr should be good-enough.

I want to avoid the complexity of using ansible(1) if possible, and if trap(1) works, I prefer it.

alejandro-colomar · 2024-05-30T09:50:10Z

As an alternative, we could consider that make check could directly print to stderr, instead of a log. Let the user redirect the logs if it wants to. This is very much preferred behavior, IMO. Then we wouldn't need to do trap(1) magic to print the logs on failure.

ikerexxe · 2024-05-30T14:09:40Z

The trap works, but it feels nasty and it forces users to use the Github scroll, which doesn't work very well.

Regarding COPY, that doesn't work and we do not only want to see the build logs but also the configuration files. This would feel like an improvement to me considering our current situation.

Finally, this would also give us the opportunity of capturing all the test logs when #835 is implemented and running.

share/container-build.sh

alejandro-colomar · 2024-05-30T16:10:40Z

share/ansible/roles/ci_run/tasks/fedora.yml

+    make check
+  args:
+    chdir: /usr/local/src/shadow/
+  ignore_errors: true


What does this exactly mean? Why do we want to ignore errors?

By default if an ansible action fails, then the ansible execution is stopped. I want to ignore the errors and continue the execution to run the last action where the logs are copied from the container to the host system. This way we can gather them for inspection.

But does it report the error later? How will we know if ansible failed, if we ignore the errors? Sorry if this is obvious; I never used Ansible before.

The error is reported, but the execution continues. At the end of the ansible execution there's the PLAY RECAP where we'll be able to check if anything failed. I created #1014 with an intentional failure to show how it works.

My bad, I forgot to set if: always() in the Github Action. It's fine now.

Thanks! It seems to work now. BTW, dumb question: how do I find and read the logs (artifacts)?

You need to open the action that failed, then click on summary, and finally scroll down to find all the artifacts

Ughhh, and then download a .zip and extract it to find the logs. Can we (also) have a copy on stderr? I very much prefer scrolling. :)

That will work until we start running the new tests and have several files to review

alejandro-colomar · 2024-05-30T16:12:14Z

share/ansible/roles/ci_run/README.md

+Usage example:
+
+    - hosts: builder
+      connection: podman
+      become: true
+      roles:
+        - role: ci_run


I don't understand this usage. Where is this YAML code supposed to be used?

In the ansible playbook: https://github.com/shadow-maint/shadow/pull/1009/files#diff-1968d6dc5c6937169e73de5fa2e83bf0d9810afcec288aa45b6cfabb40a5bf15R12-R17

Roles are self-contained units in Ansible. They are used for grouping tasks and other resources in a known file structure.

share/container-build.sh

Create `build_container` and `ci_run` roles and move the fedora target to them. Signed-off-by: Iker Pedrosa <[email protected]>

Signed-off-by: Iker Pedrosa <[email protected]>

.github/workflows/runner.yml

share/container-build.sh

Signed-off-by: Iker Pedrosa <[email protected]>

Distribution to run can be selected when running `ansible-playbook` by appending `-e 'distribution=fedora'` to the command. Signed-off-by: Iker Pedrosa <[email protected]>

Signed-off-by: Iker Pedrosa <[email protected]>

alejandro-colomar · 2024-06-05T14:51:14Z

.github/workflows/runner.yml

    - name: Build container
      run: |
-        docker buildx build -f ./share/containers/${{ matrix.os }}.dockerfile . --output build-out


BTW, what if we would map some volume to write the log files to it with docker(1)?

RUN bash -c "trap 'cat <tests/unit/test-suite.log | tee /some/mapped/volume/test-suite.log >&2' ERR; make check;"

And use https://docs.docker.com/storage/volumes/.

That would allow storing the artifacts without needing Ansible.

You can't mount volumes while building a container image, and that's exactly what we are doing with dockerfile. I have tried to solve this problem in various ways and with the technology we use it is not possible 😓

Hmmm; understood.

How about something like this?

docker build ... | tee >(sed -n '/BEGIN MARKER/,/END MARKER/p' >/host/log/file)

(probably needs redirecting stderr too (or only))

We would only need to find some consistent markers in the log.

So, the full docker logs would go to the output, and the specific error logs that we want, which would be delimited by those delimiters, would go to a specific file in the host (so the runner).

Same answer as in #1009 (comment)

If each log output has a different marker, you can parse each one separately:

docker build ... \ | tee \ >(sed -n '/BEGIN A/,/END A/p' >/host/A.log) \ >(sed -n '/BEGIN B/,/END B/p' >/host/B.log) \ >(sed -n '/BEGIN C/,/END C/p' >/host/C.log);

ikerexxe · 2024-06-13T07:56:17Z

I think this is ready for review, so I'm moving it out of draft state.

hallyn

Cool, thanks.

github-advanced-security bot found potential problems May 29, 2024

View reviewed changes

share/container-build.sh Fixed Show fixed Hide fixed

share/container-build.sh Fixed Show fixed Hide fixed

ikerexxe force-pushed the ansible_ci branch from e3acb8f to 1e129c2 Compare May 29, 2024 14:50

ikerexxe force-pushed the ansible_ci branch from a1e27c7 to 1e129c2 Compare May 30, 2024 14:04

alejandro-colomar reviewed May 30, 2024

View reviewed changes

share/container-build.sh Outdated Show resolved Hide resolved

alejandro-colomar reviewed May 30, 2024

View reviewed changes

share/container-build.sh Outdated Show resolved Hide resolved

alejandro-colomar reviewed May 30, 2024

View reviewed changes

alejandro-colomar reviewed May 31, 2024

View reviewed changes

share/container-build.sh Outdated Show resolved Hide resolved

ikerexxe added 2 commits June 5, 2024 12:30

share/ansible: create roles

76fd1a3

Create `build_container` and `ci_run` roles and move the fedora target to them. Signed-off-by: Iker Pedrosa <[email protected]>

share/.gitignore: add build-out folder

fbaa123

Signed-off-by: Iker Pedrosa <[email protected]>

ikerexxe force-pushed the ansible_ci branch 2 times, most recently from c8ceb95 to db3c30c Compare June 5, 2024 10:49

alejandro-colomar reviewed Jun 5, 2024

View reviewed changes

.github/workflows/runner.yml Show resolved Hide resolved

alejandro-colomar reviewed Jun 5, 2024

View reviewed changes

share/container-build.sh Outdated Show resolved Hide resolved

alejandro-colomar reviewed Jun 5, 2024

View reviewed changes

share/container-build.sh Show resolved Hide resolved

ikerexxe added 4 commits June 5, 2024 13:35

share/ansible: move fedora ci_run to its own file

06218a4

Signed-off-by: Iker Pedrosa <[email protected]>

share/ansible: convert debian dockerfile to ansible

26002a7

Signed-off-by: Iker Pedrosa <[email protected]>

share/ansible: convert alpine dockerfile to ansible

7f9011a

Signed-off-by: Iker Pedrosa <[email protected]>

share/ansible: implement distribution selection

be74c5f

Distribution to run can be selected when running `ansible-playbook` by appending `-e 'distribution=fedora'` to the command. Signed-off-by: Iker Pedrosa <[email protected]>

ikerexxe force-pushed the ansible_ci branch 3 times, most recently from 7a426fe to ef6fbcf Compare June 5, 2024 14:21

ikerexxe added 2 commits June 5, 2024 16:28

share/container-build.sh: update to use Ansible build

abdc5ff

Signed-off-by: Iker Pedrosa <[email protected]>

doc: update documentation to use Ansible build

d086629

Signed-off-by: Iker Pedrosa <[email protected]>

ikerexxe added 2 commits June 5, 2024 16:28

CI: use Ansible build in Github Action

a5548b9

Signed-off-by: Iker Pedrosa <[email protected]>

share/containers: remove unused dockerfiles

a42f135

Signed-off-by: Iker Pedrosa <[email protected]>

ikerexxe force-pushed the ansible_ci branch from ef6fbcf to a42f135 Compare June 5, 2024 14:28

alejandro-colomar reviewed Jun 5, 2024

View reviewed changes

ikerexxe marked this pull request as ready for review June 13, 2024 07:56

hallyn self-assigned this Jul 18, 2024

hallyn approved these changes Jul 18, 2024

View reviewed changes

hallyn merged commit fffa4d3 into shadow-maint:master Jul 18, 2024
9 checks passed

ikerexxe deleted the ansible_ci branch July 19, 2024 06:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move container build infrastructure to Ansible #1009

Move container build infrastructure to Ansible #1009

ikerexxe commented May 29, 2024

alejandro-colomar commented May 29, 2024 •

edited

Loading

ikerexxe commented May 30, 2024

alejandro-colomar commented May 30, 2024 •

edited

Loading

alejandro-colomar commented May 30, 2024

ikerexxe commented May 30, 2024

alejandro-colomar May 30, 2024 •

edited

Loading

ikerexxe Jun 5, 2024

alejandro-colomar Jun 5, 2024

ikerexxe Jun 5, 2024

alejandro-colomar Jun 5, 2024

ikerexxe Jun 5, 2024

alejandro-colomar Jun 5, 2024

ikerexxe Jun 13, 2024

alejandro-colomar Jun 13, 2024 •

edited

Loading

ikerexxe Jun 13, 2024

alejandro-colomar May 30, 2024

ikerexxe Jun 5, 2024

alejandro-colomar Jun 5, 2024 •

edited

Loading

alejandro-colomar Jun 5, 2024 •

edited

Loading

ikerexxe Jun 13, 2024

alejandro-colomar Jun 13, 2024 •

edited

Loading

ikerexxe Jun 13, 2024

alejandro-colomar Jun 13, 2024 •

edited

Loading

ikerexxe commented Jun 13, 2024

hallyn left a comment

Move container build infrastructure to Ansible #1009

Move container build infrastructure to Ansible #1009

Conversation

ikerexxe commented May 29, 2024

alejandro-colomar commented May 29, 2024 • edited Loading

ikerexxe commented May 30, 2024

alejandro-colomar commented May 30, 2024 • edited Loading

alejandro-colomar commented May 30, 2024

ikerexxe commented May 30, 2024

alejandro-colomar May 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alejandro-colomar Jun 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alejandro-colomar Jun 5, 2024 • edited Loading

Choose a reason for hiding this comment

alejandro-colomar Jun 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alejandro-colomar Jun 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alejandro-colomar Jun 13, 2024 • edited Loading

Choose a reason for hiding this comment

ikerexxe commented Jun 13, 2024

hallyn left a comment

Choose a reason for hiding this comment

alejandro-colomar commented May 29, 2024 •

edited

Loading

alejandro-colomar commented May 30, 2024 •

edited

Loading

alejandro-colomar May 30, 2024 •

edited

Loading

alejandro-colomar Jun 13, 2024 •

edited

Loading

alejandro-colomar Jun 5, 2024 •

edited

Loading

alejandro-colomar Jun 5, 2024 •

edited

Loading

alejandro-colomar Jun 13, 2024 •

edited

Loading

alejandro-colomar Jun 13, 2024 •

edited

Loading